Goto

Collaborating Authors

 differentiable architecture approximation


DATA: Differentiable ArchiTecture Approximation

Neural Information Processing Systems

Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap, we develop Differentiable ArchiTecture Approximation (DATA) with an Ensemble Gumbel-Softmax (EGS) estimator to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients from binary codes to probability vectors. Benefiting from such modeling, in searching, architecture parameters and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep models in a large enough search space. Conclusively, during validating, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on a variety of popular datasets strongly evidence that our method is capable of discovering high-performance architectures for image classification, language modeling and semantic segmentation, while guaranteeing the requisite efficiency during searching.


Reviews: DATA: Differentiable ArchiTecture Approximation

Neural Information Processing Systems

The paper takes the gumbel softmax trick in SNAS [1] further by ensembling the gumbel softmax estimator. As the result, it has a richer sample space while still being efficient. Rather than the credit assignment approach in SNAS, DATA makes use of the differentiability to update the probability vector. The paper is well written and clearly motivates the proposed approach. I am convinced that the proposed EGS estimator can bridge the gap of architectures between searching and validating, which is a well-known issue in DARTS [2]. The argument that the richer search space of EGS estimator is backed up by the experiments.


DATA: Differentiable ArchiTecture Approximation

Neural Information Processing Systems

Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap, we develop Differentiable ArchiTecture Approximation (DATA) with an Ensemble Gumbel-Softmax (EGS) estimator to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients from binary codes to probability vectors. Benefiting from such modeling, in searching, architecture parameters and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep models in a large enough search space. Conclusively, during validating, a high-performance architecture that approaches to the learned one during searching is readily built.


DATA: Differentiable ArchiTecture Approximation

Chang, Jianlong, zhang, xinbang, Guo, Yiwen, MENG, GAOFENG, XIANG, SHIMING, Pan, Chunhong

Neural Information Processing Systems

Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap, we develop Differentiable ArchiTecture Approximation (DATA) with an Ensemble Gumbel-Softmax (EGS) estimator to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients from binary codes to probability vectors. Benefiting from such modeling, in searching, architecture parameters and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep models in a large enough search space. Conclusively, during validating, a high-performance architecture that approaches to the learned one during searching is readily built.